Average optimality for continuous-time Markov decision processes with a policy iteration approach

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces

and Applied Analysis 3 ii A is an action space, which is also supposed to be a Polish space, andA x is a Borel set which denotes the set of available actions at state x ∈ S. The set K : { x, a : x ∈ S, a ∈ A x } is assumed to be a Borel subset of S ×A. iii q · | x, a denotes the transition rates, and they are supposed to satisfy the following properties: for each x, a ∈ K and D ∈ B S , Q1 D → q...

متن کامل

Online Markov decision processes with policy iteration

The online Markov decision process (MDP) is a generalization of the classical Markov decision process that incorporates changing reward functions. In this paper, we propose practical online MDP algorithms with policy iteration and theoretically establish a sublinear regret bound. A notable advantage of the proposed algorithm is that it can be easily combined with function approximation, and thu...

متن کامل

Continuous-time Markov decision processes with nth-bias optimality criteria

In this paper, we study the nth-bias optimality problem for finite continuous-time Markov decision processes (MDPs) with a multichain structure. We first provide nth-bias difference formulas for two policies and present some interesting characterizations of an nth-bias optimal policy by using these difference formulas. Then, we prove the existence of an nth-bias optimal policy by using nth-bias...

متن کامل

The Policy Iteration Algorithm for Average Reward Markov Decision Processes with General State Space

The average cost optimal control problem is addressed for Markov decision processes with unbounded cost. It is found that the policy iteration algorithm generates a sequence of policies which are c-regular (a strong stability condition), where c is the cost function under consideration. This result only requires the existence of an initial c-regular policy and an irreducibility condition on the...

متن کامل

A Simulation-Based Policy Iteration Algorithm for Average Cost Unichain Markov Decision Processes

In this paper, we propose a simulation-based policy iteration algorithm on Markov decision process (MDP) problems with average cost criterion under the unichain assumption, which is a weaker assumption than found in previous work. In this algorithm, 1) the problem is converted to a stochastic shortest path problem and a reference state can be chosen as any recurrent state under the current poli...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Mathematical Analysis and Applications

سال: 2008

ISSN: 0022-247X

DOI: 10.1016/j.jmaa.2007.06.071